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Abstract — We address a sequential decision problem that arises 
in the computation of symmetric Boolean functions of distributed 
data. We consider a collocated network, where each node's 
transmissions can be heard by every other node. Each node 
has a Boolean measurement and we wish to compute a given 
Boolean function of these measurements. We suppose that the 
measurements are independent and Bernoulli distributed. Thus, 
the problem of optimal computation becomes the problem of 
optimally ordering node's transmissions so as to minimize the 
total expected number of bits. 

We solve the ordering problem for the class of Boolean 
threshold functions. The optimal ordering is dynamic, i.e., it 
could potentially depend on the values of previously transmitted 
bits. Further, it depends only on the ordering of the marginal 
probabilites, but not on their exact values. This provides an 
elegant structure for the optimal strategy. For the case where each 
node has a block of measurements, the problem is significantly 
harder, and we conjecture the optimal strategy. 

I. INTRODUCTION 

Most sensor network applications are typically interested 
only in computing some relevant function of the correlated 
data at distributed sensors. For example, one might want to 
compute the mean temperature for environmental monitoring, 
or the maximum temperature in fire alarm systems. On the 
other hand, sensor nodes are severely limited in terms of 
power and bandwidth, and are generating enormous quantities 
of data. Thus, we seek efficient in-network computation and 
communication strategies for the function of interest. 

Computing and communicating functions of distributed data 
presents several challenges. On the one hand, the wireless 
medium being a broadcast medium, nodes have to deal with 
interference from other transmissions. On the other hand, 
nodes can exploit these overheard transmissions, and the 
structure of the function to be computed, to achieve a more 
efficient description of their own data. Moreover, the strategy 
for computation may benefit from interactive information 
exchange between nodes. 

We consider a collocated network where each node's trans- 
missions can be heard by every other node. At most one node 
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is allowed to transmit successfully at any time. Each node has 
a Boolean variable and we focus on the specific problem of 
symmetric Boolean function computation. We adopt a deter- 
ministic formulation of the problem of function computation, 
allowing zero error. We suppose that node measurements 
are independent and distributed according to given marginal 
Bernoulli distributions. In this paper, we focus on optimal 
strategies for Boolean threshold functions, which are equal 
to 1 if and only if the number of nodes with measurement 
1 is greater than a certain threshold. The set of admissible 
strategies includes all interactive strategies, where a node may 
exchange several messages with other nodes. 

In the case where each node has a single bit, the commu- 
nication problem is rendered trivial, since it is optimal for 
the transmitting node to simply indicate its bit value. Thus, 
it only remains to determine the optimal ordering of nodes' 
transmissions so as to minimize the expected number of bits 
exchanged. For the class of Boolean threshold functions, we 
present a simple indexing policy for ordering the transmissions 
and prove its optimality. The optimal policy is dynamic, possi- 
bly depending on the previously transmitted bits. Further, the 
optimal policy depends only on the ordering of the marginal 
probabilities, but surprisingly not on their values. 

The problem of optimally ordering transmissions of nodes 
is a sequential decision problem and can indeed be solved by 
dynamic programming. However, this would require solving 
the dynamic program for all thresholds and all probability 
distributions, which is computationally hard. We avoid this, 
and establish a more insightful solution, in the form of a simple 
rule defining the optimal policy. 

In Section [Till we formulate the problem of single instance 
computation, and derive the resulting dynamic programming 
equation. We then propose the indexing policy and present 
a detailed proof of optimality, by induction on the number of 
nodes in the network. In Section HVl we consider the extension 
to the case of block computation, where each node has a 
block of measurements and we are allowed block coding. 
This problem is significantly harder, and we conjecture the 
structure of an optimal multi-round policy, building on the 
optimal policy for single instance computation. 



II. RELATED WORK 

The the problem of worst-case block function computation 
with zero error was formulated in (TJ. The authors identify 
two classes of symmetric functions namely type-sensitive 
functions exemplified by Mean, Median and Mode, and type- 
threshold functions, exemplified by Maximum and Minimum. 
The maximum rates for computation of type-sensitive and 
type-threshold functions in random planar networks are shown 
to be and ® ( i giogn ) res P ec tively, where n is the 

number of nodes. If we impose a probability distribution on 
the node measurements, one can show that the average case 
complexity of computing type-threshold functions is 0(1) @. 

In this paper, we require that every node must compute the 
function. This approach naturally allows the use of tools from 
communication complexity [3) . In communication complexity 
13), we seek to find the minimum number of bits that must 
be exchanged between two nodes to achieve worst-case zero- 
error computation of a function of the node variables. The 
problem of worst-case Boolean function computation was 
first considered in [4], where the complexity of the Boolean 
AND function was shown to be log 2 3 bits. In 13, this was 
considerably generalized to derive the exact complexity of 
computing Boolean threshold functions. 

If the measurements are drawn from some joint probability 
distribution and one is allowed block computation, we arrive 
at a distributed source coding problem with a fidelity criterion 
that is function-dependent, for which little is known. One 
special case, a source coding problem for function computation 
with side information, has been studied in Q. The problem 
of interactive function computation in collocated networks has 
been studied in 0. 

The problem of minimizing the depth of decision trees for 
Boolean threshold queries is considered in (8). In (9), an 
interesting problem in sequential decision making is studied, 
where, n nodes have i.i.d. measurements, and a central agent 
wishes to know the identities of the nodes with the k largest 
values. One is allowed questions of the type "Is X > t", 
to which the central agent receives the list of all nodes 
which satisfy the condition. Under this framework, the optimal 
recursive strategy of querying the nodes is found. A key 
difference in our formulation is that we are only allowed to 
query particular nodes, and not all nodes at once. 

III. Optimal ordering for single instance 

COMPUTATION 

Consider a collocated network with nodes 1 through n, 
where each node i has a Boolean measurement Xj e {0, 1 }. The 
XiS are independent of each other and drawn from a Bernoulli 
distribution with P(Xj = 1) =: Without loss of generality, 
we assume that pi < p2 < ■■ ■ < p n . 

We address the following problem. Every node wants to 
compute the same function f(X\ ) X2,---,X n ) of the measure- 
ments. We seek to find communication schemes which achieve 
correct function computation at each node, with minimum 
expected total number of bits exchanged. Throughout this 
paper, we consider the broadcast scenario where each node's 



transmission can be heard by every other node. We also sup- 
pose that collisions do not convey information thus restricting 
ourselves to collision-free strategies as in UJ. This means 
that for the k lh bit b^, the identity of the transmitting node 
Tk depends only on previously broadcast bits bi,b2,---,b) c -i, 
while the value of the bit it sends can depend arbitrarily on all 
previous broadcast bits as well as its own measurements Xj k . 

First, we note that since each node has exactly one bit of 
information, it is optimal to set = Xr k . Indeed, for any 
other choice b' k =g(b\,. . . jbk-ijX^), the remaining nodes can 
reconstruct b' k since they already know bi,...,bi i -\. Thus the 
only freedom available is in choosing the transmitting node 7). 
as a function of b i,£>2, . . . ,£>a--i, for otherwise the transmission 
itself could be avoided. We call this the ordering problem. 
Thus, by definition, the order can dynamically depend on the 
previous broadcast bits. In this paper, we address the ordering 
problem for a class of Boolean functions, namely threshold 
functions. 

Notation: The set of measurements of nodes 1 through 
n is denoted by (X\,X2, . . . ,X„) which is abbreviated as 
X". In the sequel, we will use X" ( to denote the set of 
measurements (X\ , . . . , 1 , X;+ 1 , . . . , X n ) . As a natural ex- 
tension, we use X" /. .% to denote the set of measurements 
(X u ... ,Xi-i,X i+ i, . . . ,Xj_i,X j+ i, . . . ,X n ), where i < j. 

A. Optimal ordering for computing Boolean threshold func- 
tions 

Definition 1 (Boolean threshold functions): A Boolean 
threshold function I\q(X\ ,X2, ■ ■ ■ ,X n ) is defined as 

n (x h x 2 ,...,x n ) = { I f th ^;| ' 

Given a function n„_^(X"), the ordering problem can in- 
deed be solved using dynamic programming. Let C(il„_^(X") 
denote the minimum expected number of bits required to 
compute n„_£(X"). The dynamic programming equation is 

c(n, I _ A .(x' , )) = min{i+ P; c(n„_A._ 1 (x'l,0)+(i-p/)c(n, J _ A .(x'l,.))}. 

However solving this equation is computationally complex. 
Further, it is unclear at the outset if the optimal strategy will 
depend only on the ordering of the /?,s, or their particular 
values. This makes the explicit solution of ( IIH-At for all n, k 
and (pi,p2, ■ --Pn) notoriously hard. We present a very simple 
characterization of the optimal strategy for each n and < 
k < n — 1 and show that this is independent of the particular 
values of the p,s, but only depends on the ordering. 

To begin with, we argue that solving the ordering problem 
for Boolean threshold functions, is equivalent to solving the 
following problem for each n and k: In the optimal strategy 
for computing Tl n -k (X\ ,X2, ■■■X n ) determine which node must 
transmit first. Indeed, if T(l) is the first node to transmit under 
the optimal strategy, then, depending on whether X^m = 
or X T (i\ = 1, the rest of the nodes would need to compute 
n n _ A (X" j,,^) or n„_ / t_i(X" T (j\). Since we solved the prob- 
lem for all n and k, we can determine which node should 
transmit next in either case. 



Theorem 1: In order to compute the Boolean threshold 
function n„_^(X"), it is optimal for node k+l to transmit 
first. This result is true for all n and all < k < n — 1 and all 
probability distributions with p\<pi<-..<Pn- 
Proof: Define C(U n _ k (X")) := C(U n _ k (X n )) - 1 for notational 
convenience. We also define the following expressions. 

r„ 1 ^(X"'):= w+1 c(n„ 1 _,_ l (x m ( , +I) )) + (i- w+1 )c(n„ 1 _,(x m ( , +I) )) 

-p,c(n,„-*-i(x'»,)) _ (1 _ p ,.)c(n B _fc(x?y). (i) 

T m k i is the difference between the expected number of bits 
when node k+l transmits first, and the expected number of 
bits when node i transmits first. 

<U X '") : = -Pi)C(IV. t _i(X» {jt+lij) )) 

+ (i - P , +1 )c(n m _,(x m ( , +1) )) - (i - W )c(n m _ t (x m ,.)) (2) 

s%{* m ) ■■= (p«-Pi+i)c(n nj _ t _ 1 (x- Ci ., +1) )) 

+ K+1 c(n ffl _ (t _ 1 (x m (t+1) )) - P; c(n m -A-i(x m ,.)) (3) 

We do not yet have an interpretation for 5^1- and S 1 !?!,-. 
However, we will use these expressions in the sequel. 

We establish the above theorem by induction on the number 
of nodes n. However, we need to load the induction hypothesis. 
Consider the following induction hypothesis. 

(a) T mXi (X m ) < for all < k < (m - 1) , 1 < i < m 

(b) S^ kl {X m ) < forallO<fc<(m-l),A: + 2<j<m 

(c) S^ k - (X m ) < for all < k < (m - 1) , 1 < i < k 

Observe that part (a) immediately establishes that k+l should 
transmit first in the optimal strategy for computing the function 

rwx m ). 

The basis step for m = l,k = 1 is trivially true. Let us 
suppose the induction hypothesis is true for all m < n. We 
now proceed to prove the hypothesis for m = n+l. 

Lemma 1: For fixed k and ; > k + 2, we have 

si+i^(x" +1 )<o. 

Proof: See Appendix lAl 

Lemma 2: For fixed k and ; < k, we have k ; (X" +1 ) < 0. 
Proof: See Appendix [B] 

Lemmas [TJ and [2] establish the induction step for parts (b) 
and (c) of the induction hypothesis. We now proceed to show 
the induction step for part (a). 

Lemma 3: For fixed k and i > k + 2, we have 
T n+l ,u(^ +1 )<S < n \ ki (X^). 
Proof: See Appendix |0 

Lemma 4: For fixed k and i < k, we have T n+ i k j(X" +l ) < 

Proof: See Appendix ID1 

Using Lemmas [3] and |4] together with Lemmas Q] and |2] we 
see that T n+lXi {X n+l ) < for all <k <n and i^k + l. For 
the case i = k+l, we have T(n + l,k,k+l) = trivially. This 
completes the induction step for part (a), and the proof of the 
Theorem. □ 



IV. Optimal ordering for block computation 

We now shift attention to the case where we allow for nodes 
to accumulate a block of measurements, and thus achieve 
improved efficiency by using block codes. We consider the 
class of all interactive strategies for computation, where the 
kth bit can depend arbitrarily on all previously broadcast bits. 
We require that all nodes compute the function with zero error 
for the block. We present a conjecture for the optimal strategy 
based on the insight gained from the single instance solution. 

Conjecture 1: In order to compute the Boolean threshold 
function Tl n _ k (X"), it is optimal for node k+l to transmit 
first, using the Huffman code. This result is true for all « 
and all < k < n — 1 and all probability distributions with 

Pi <P2 < ••• < Pn- 

Observe that after node k+l transmits, we are left with two 
block computation problems. For the instances where X k+ \ = 
0, we need to compute Yl„_ k (X"_^ k+ - l ^) and for the instances 
where X k+ \ = 1, we need to compute II„_<._i(X" Thus 
the conjectured strategy can be recursively applied, yielding 
an interactive multi-round strategy. However, proving the 
optimality of this strategy is significantly harder. For worst 
case block computation, the lower bound is established using 
fooling sets [5]. Adapting this idea to the probabilistic scenario 
remains an interesting challenge for the future. 

V. Concluding remarks 

We have considered a sequential decision problem, that 
arises in the context of optimal computation of Boolean 
threshold functions in collocated networks. For single instance 
computation, we show that the optimal strategy has an elegant 
structure, which depends only on the ordering of the marginal 
probabilities, and not on their exact values. The extension 
to the case of block computation is harder and remains a 
challenge for the future. It is also interesting to extend this 
result to the case of correlated measurements 

References 

[1] A. Giridhar and P. R. Kumar. Computing and communicating functions 
over sensor networks. IEEE Journal on Selected Areas in Communication, 
23(4):755-764, April 2005. 

[2] H. Kowshik and P. R. Kumar. Zero-error function computation in sensor 
networks. In Proceedings of the 48th IEEE Conference on Decision and 
Control(CDC), December 2009. 

[3] E. Kushilevitz and N. Nisan. Communication Complexity. Cambridge 
University Press, 1997. 

[4] R. Ahlswede and Ning Cai. On communication complexity of vector- 
valued functions. IEEE Transactions on Information Theory, 40:2062- 
2067, 1994. 

[5] H. Kowshik and P. R. Kumar. Optimal strategies for computing boolean 
functions in collocated networks. In Proceedings of the Information 
Theory Workshop, Cairo, lanuary 2010. 

[6] A. Orlitsky and I. R. Roche. Coding for computing. IEEE Transactions 
on Information Theory, 47:903-917, 2001. 

[7] N. Ma, P. Ishwar, and P. Gupta. Information-theoretic bounds for multi- 
round function computation in collocated networks. In IEEE International 
Symposium on Information Theory (ISIT), 2009. 

[8] Yosi Ben-Asher and Ilan Newman. Decision trees with boolean threshold 
queries. Journal of Computer and System Sciences, 51, 1995. 

[9] K. J. Aitow, L. Pesotchinsky, and M. Sobel. On partitioning of a sample 
with binary-type questions in lieu of collecting observations. Journal of 
the American Statistical Association, 76(374):402^109, June 1981. 



Appendix 



A. Proof of Lemma Q] 



(p k+1 - Pi )c(n n _ k (x"+ k l +u] )) + (i - mi )c(n„_, +1 (x^ +1) )) - (i - Pi )c(u n _ k+l (x n + i )) 
bk+i-Pi) [i+wC(n„-*-i(x^^ 

[^(n»_j t (x^ 1 p) + (i-pOc(n w -*+i(x^ 1) ))" 



(4) 



p* 



< 



p* 



< Pk 
= Pk 



( Pk+1 -/>oan„-*-i(x^ +lii) )) + (l - w+1 )c(n„_ t (x^ it+1) )) - (l - w )c(n»-*(x»-H )) 
-p^ |Wi -Pi)c(n n _ k (x"+ l kk+l )) + (i -^ +1 )c(n„_, +1 (x^ +1) )) - (i -p^^+iCx^)) 



- Pi )c(n„_,_ 1 (x':+ 1 , +1 >0 )) + (i - w+1 )c(n„_,(x^ +1) )) - (i - Pi )c(n n _ k (x»_ + {k l . } )) 
(p t+1 -p,)c(n„_,_ 1 (x':+ 1 , +1 )) + (i -^ +1 )c(n n _,(x"_+ 1 , +1) )) - (l - Pl -)c(n„_,(x':+ 1 . ) )) 



+ (i-p^!li, i -- 1 (x"4 1 ) 

(5) 



<0. 



( Pk+l - Pi )c(n M (x n +l k+1 .,)) + (i - P , +1 )c(n„_,(x"+^ +1) )) 
-(i -Pi)[p M c(n n _ k ^(x n +l k+u) )) + (i - Pk+i )c(n„_ k (x"+l k+Xi) ))] 
= Pk(i - pm) [c(u n _ k (x n +l >k+l) )) - Pi c(u, M (x"_+l k+l :) )) - (i - p,)c(n n _ k (x n _+l k+u) )) 

Equation (0]i follows from the optimal ordering for computing Tl n _ k (X"_ + ^ k+l .A n„_^ + i(X"^ +1 j) and n n _fc + i(X^" ), which 

is true by the induction hypothesis for m = n. The inequality (O follows from the induction hypothesis that S^lj ^(X"^ 1 ) < 0. 
Equality in © and (0 follows from the optimal ordering for computing Tl n _ k (X"^ k ^) and n^-^X""^^^) respectively. □ 



(6) 
(7) 



B. Proof of Lemma [2] 



(p,-p 4+1 )ccn»^(X^ +1 p)+^^ 



(pj-pjt+i) [i+p w c(n„-n(x^ +u+2) )) + {i-p k+ 2)c{u n _ k (x n +l M1M2) )) 
+Pk+i l 

fc+2) 



-Pi 

PA-+2 



^+2C(n B _ jfe _ 1 (x^ +2) )) + (l - Pk+2 )c(u n _ k (x n +l k+2) )) 

\pt - Pk + \)c(n, M (x n +l k+lk+2) )) + Pk+i c^n-k-x{x n +l +lk+2) )) - p,c(n„-n (X-ji +2) )) 
+(i-p* +2 ) [b i - J p fc+ i)c(n n _ fc (x"+i fc+lfc+2) )) +rt+1 c(n„_,(x"+, 1 +1 , +2) )) - P( c(n„_ fc (x^ +2) )) 



(8) 



< Pfc+2 

= {\-pk+z) 



jvn+l 



(k+2)> 



+ {1-Pk+z) ( Pl - P k +l )c(u n _ k (x"_ + { l k+lk+2) )) + p k+1 c(u n ^ k (x n +l +1}k+2) )) - Pi c(u n _ k (x n _ + ( I M2) )) 



(pi - pa:+ i )c(n n _* i )jfe+2 ) ) ) + p*+ i c(n„_i (X""^ + j ;it+2) 



)) - Pi C(U n _ k (X n +} 



)) 



(p,- - w+1 )c(n„_,(x«+ i 1 , +1 k+2) )) + Pk+l c(n n _ k (x"_+ l k+lk+2] )) 



- Pi \p k+l C(Yl lM (X"+ 



(i,k+l,k+2) 
n+1 



)) + (i- w+1 )c(n„_,(x"_+i fc+lfc+2) 



))] 



(i - Pk+i) P k+\ c'(n n _fc(X""^ +life+2 p) -/?jC(n B _ j fc_i(X"|. )t+1)i+2 j 



))-(i-pOc(n„_,(x»-g t+li , +2) )) 



<o. 



(9) 

(10) 
(11) 



Equation ([8]) follows from the optimal ordering for computing n n _fc(X^/?£ +1 A n n _fc(X 

"tfc +1) ) and IV^X^ 1 ), which follows 

from the induction hypothesis for m = «. The inequality (O follows from the induction hypothesis that ^ 2 ^,(X"^ +2 j) < 0. 

Equations ( [Tol l and ( TTTI ) follow from the optimal ordering for computing H„- k (X n t: k+2 \) and Yl„_ k (X n ^ k+i k+2 \) respectively. 

□ 



C. Proof of Lemma \3\ 
First, we observe that 



Thus it is enough to show that 



))- Pi C(U n . k (X n _f)) < ( Pk+l - Pi )C(Yl^ k (X"+l +li) )) 

[k+l)> 



Pk+l 
-Pi 
Pk+l 



(k+l,i) 

Pk+2C{Ti n -k-\{x"^l +lk+2) )) + (i - p k+ 2)C(n n _ k (x"+^ +lJi+2) )) 



Pk+l c{^n-k{x n +l + 11 ))-^ i c(n B _,(x"_t i )) 



-t* 1 + i. t+2 ))) + ( 1 - Pk+2)C(Yl n _ k (X" +1 

n+1 \\ i /l „ \r(Tt Cvn+1 



Pk+1 c(n M (x^ +hi) )) + (l - w+1 )c(n„_*(x^ +y) )) 
Pi+2C(n„_ i ._ ! (x"|/ +li:+2) ) ) - p,-c(n„_ S; _ ! (x"^ +1 ^ )) 



+^ + i(i-w + 2)c(n„_ fc (xi+ 1 +1)fc+2) )) - ft+1 )c(n„_ t (x^ li . ) )) 
< Pk+i [(i- Pi )c(u n _ k (x n +l +1 . ) )) - (i - Pi+2 )c(n„_,(x"+, 1 +1 1+2) ))" 

+ Pk +i(i-p k +2)c(n n _ k (x n +l +1;k+ ^ 
fi 



(12) 



(13) 



- Pi )c(n n _ k (x_+l +l{) )) 

Equation [T2l follows from the optimal order for computing Tl„_ k (X"^ k l +l ^) and n„_ J t(Xl| ). The inequality in[l3]follows from 
the induction hypothesis T„ ik j(X"lj k+l -A < 0. □ 



D. Proof of Lemma [4] 
First, we observe that 

W, ; -(x« +! ) -sg u ,,-(x" +1 ) = (1 - w+1 )c(n„_, +1 (x«+' )) - (i -/^(rv^x'ij 1 )) - (^-^ +1 )c(n„_,(x"+i .)) 



Thus it is enough to show that 



(1 -p k+l )C(n n _ k+1 (X n _+l +1) )) - (1 - Pi )C(Yl n _ k+l (X"+ 1 )) < (p^p k+i )C(Yl n _ k (X"_+l k+l) 



)) 



for i < k 



(i - w+1 )c(n„_, +1 (x«+; +1) )) - (i - Pi )c(u n _ k+1 {x n _Y)) 

= (l -Pk+i) [pkC(U^ k (X'l]l k+l) )) + (l- Pk )C^ 

-(1-pO [w + iC(n„_,(x«+ ( . 1 , +1) )) + (i - w+1 )c(n„_, +1 (x"+i, +1) )) 
= (i-ft+i) [(i - Pk)c(n tM (x"+l k+l] )) - (i- Pi )c(n^ k+l (x n +l k+l) )) 

+ Pk {\- Pk+i )c{n n _ k {x n +l k+r) ))- 
< (i - Pk+i) Uc(n„_,(x^ +1) )) - Pk c{u n _ k {x n +l k+l) )) 



+p k (i-p k+1 )c(u n _ k (x n +l k+1) )) - Pk+1 {\ - Pi )c(u n _ k (x n +l k+l) 
= ( Pi - Pk+1 )c(u n _ k (x n +l k+1) )) 



)) 



(14) 



(15) 



Equation [T4l follows from the optimal order for computing n„_^ + i (X"^ +1 j) and n„_^- + 1 (X""t ). The inequality in [T51 follows 



from the induction hypothesis T, hk _ij(X"^ k+i ^) < 0. □. 



